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Abstract — In this paper we consider an interacting two-agent 
sequential decision-making problem consisting of a Markov 
source process, a causal encoder with feedback, and a causal 
decoder. Motivated by a desire to foster links between control 
and information theory, we augment the standard formulation 
by considering general alphabets and a cost function operating 
on current and previous symbols. Using dynamic programming, 
we provide a structural result whereby an optimal scheme exists 
that operates on appropriate sufficient statistics. We emphasize 
an example where the decoder alphabet lies in a space of beliefs 
on the source alphabet, and the additive cost function is a log 
likelihood ratio pertaining to sequential information gain. We 
also consider the inverse optimal control problem, where a fixed 
encoder/decoder pair satisfying statistical conditions is shown to 
be optimal for some cost function, using probabilistic matching. 
We provide examples of the applicability of this framework to 
communication with feedback, hidden Markov models and the 
nonlinear filter, decentralized control, brain-machine interfaces, 
and queuing theory. 



I. Introduction 

Many current and future societal problems involve designing 
and understanding networks of sequential decision-making en- 
tities cooperating in an uncertain environment. Some of these 
entities may be physical/biological agents, whereas others 
might be computerized systems. For example, cyber-physical 
systems feature interacting networks of physical processes that 
are noisily sensed and actuated by computational algorithms. 
The mammalian brain comprises another example, where the 
cooperative goals of sensing, perception, learning, and eliciting 
behavior are achieved via coupled neural systems that interact 
via signaling across a noisy biological medium. 

From an engineering system designer vantage point, obtain- 
ing optimal coordination strategies for a network of interacting 
decision-makers is in general computationally intractable I!]. 
For a class of small networks (e.g. comprising a specific 
interaction structure between an encoder and a decoder), and 
an asymptotic performance objective, fundamental limits of 
performance can be addressed using the information theoretic 
concepts of communication and rate-distortion ||2]. Identify- 
ing optimal strategies for sequential decision-making under 
uncertainty for a single agent, on the flipside, is traditionally 
addressed with control theoretic -principles of Markov decision 
theory ID. 



From a scientific vantage point, the joint statistical dynam- 
ics between interacting decision-makers can provide insight 
into the cost or utility they are cooperatively optimizing. 
For small networks (e.g. an encoder and decoder) with a 
limited statistical dynamics interaction structure, this has been 
addressed with the information-theoretic principle of source- 
channel probabilistic matching I?). Inverse optimal control 
theory theory identifies cost functions for which a fixed 
strategy of one decision-maker is optimal |5j| and has been 
used in neural |I6|, Q and cognitive science [Sl applications. 

It appears evident that understanding this class of problems 
for more general objectives and interaction structures can 
utilize insights from both information and control theory, but 
the differences in their philosophical starting points is striking: 
Information Theory problems, ttaditionally specify large but 
fixed time horizon n for which some decisions are not made 
until this terminal point. Even in problems where neither an 
observation nor a decision variable lies in a time-horizon 
dependent set (e.g. reproducing a source over a noisy channel 
with a fidelity criterion). Shannon's 'separation theorem' ||9l 
shows that for very large n, it is sufficient to first decom- 
pose the problem into sub-problems, each of which contains 
some observations or decision variables with time horizon- 
dependent alphabet structure (e.g. of size 2"^) and a per- 
formance objective pertaining to constrained extremizing of 
mutual information. As such, traditional information theoretic 
problem formulations have the following starting point: 

(a) titne horizon- dependent alphabets 

(b) some decisions made at final stage of long time horizons 

(c) performance objective: extremize mutual information 

Control Theory Markov decision theory problems typically 
involve observations of state variables' whose future statistics 
are impacted by their current values and the current 'decision 
variable' that is under causal control of a decision-maker 
The alphabet size of observations and decision variables are 
typically unrelated to the time horizon n of the problem. More- 
over, at each time step, a decision must be made based upon 
causal information up to that time. Lastly, the performance 
objective is to minimize an expected sum of costs, each of 
which operates on current state, observation, and decision 
variables. Structural results are typically desirable in such 
settings because they develop conditions relating the existence 
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of explicit, non-random strategies that operate on sufficient 
statistics. Succinctly, we can state this as follows: 

(a) titne horizon-independent alphabets 

(b) decisions made sequentially based on causal information 

(c) performance objective: sum of costs operating on current 
observations and decision variables 

So these two philosophies have striking differences. Consider 
the class of 'causal coding/decoding' problem that further 
demonstrates this: At each time step i, the causal encoder's 
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Fig. 1. Basic problem setup: an optimal causal coding/decoding problem. 

decision variable is the input G X to a noisy channel that is 
a causal function of source inputs (Wi, . . . , Wi) and the noisy 
channel outputs Yi, . . . , Yi^i. Xi — ei{W^ ,Y^~^) The causal 
decoder's decision variable is a 'source estimate' Zi € Z 
that is a causal function of channel outputs {Yi,...,Yi): 
Zi = They jointly design their strategies tt = {e,d) 

to minimize a function J„ jr pertaining to an expected sum of 
costs: 



J n 



Y.9iW,,Z,) 



,1=1 



(1) 



Some aspects of the problem appear to make it amenable 
to a control theoretic analysis: (a) the source alphabet W is 
unrelated to the time horizon ?? and (b) the sequential decision- 
making and additive costs , (c) the performance objective ([U 
operates additively on observations/decision variables in the 
vicinity of each time i as compared to only at the final time 
horizon n. The presence of the noisy channel in the loop pos- 
sibly make it amenable to an information-theoretic analysis: 
mutual information could plausibly provide tight bounds on 
attainable costs. On the flipside, neither agent's observations 
at any time point are a nested version of the other's and so 
they have a 'non-classical' information structure llTOl - making 
this a 'hard' control problem. Analogously, the 'hard' delay 
constraint pertaining to causal decoding and typical 'hard 
decision' assumption of W,Z being in discrete, time-horizon 
independent alphabets typically render information-theoretic 
techniques irrelevant to the understanding of these 'real-time' 
problems Ifm. IfTH. 

In this paper we consider a causal coding/decoding problem 
where is a Markov source process. We consider additive 
cost functions operating of the form g{wi, Xi, z^-i, Zi). We do 
not impose assumptions (e.g. finiteness) on alphabets of the 
variables. Our motivation for this more general framework is 
an example (Section II-Ab motivated by feedback communi- 
cation where the source alphabet is continuous, the decoder 
alphabet lies in a space of beliefs on the source alphabet, and 
the additive cost function is a log likelihood ratio pertaining 
to sequential information gain. Using dynamic programming. 



we provide a structural result whereby an optimal scheme 
exists that operates on appropriate sufficient statistics. We also 
consider the inverse optimal control problem, where a fixed 
encoder/decoder pair satisfying a sufficient statistical condition 
is shown to be optimal for some cost function, using proba- 
bilistic matching. We provide examples of the applicability 
of this framework to communication with feedback, hidden 
Markov models and the nonlinear filter, decentralized control, 
brain-machine interfaces, and queuing theory. 

A. Example: Communication over a Noisy Channel with 
Feedback and the Sequential Information Gain Cost 

We now consider the traditional feedback communication 
model and how its assumptions - along with traditional 
'real-time' problem assumptions - can be modified so that 
fundamental limits are unchanged but the frameworks align. 
Consider the traditional information-theoretic communication 
model with feedback, consisting of an encoder, a decoder, 
and a fixed block length n. The encoder has a message 
e W = {1, . . . , 2"^}. It specifies n inputs to the channel, 
Xi, . . . , Xn- The channel is memoryless and non-anticipative 
where PY\x{y\x) is the statistics of the output given the 
input. At each time step i, the encoder selects the message 
W and the previous channel outputs Yi, . . . at time i, 

to specify the next channel input Xj. The decoder, at time n, 
having acquired channel outputs Yi, . . . , y„, specifies a single 
decision, Wn G W. The question asked in information theory 
is, how large can R be such that for sufficiently large n, there 
exist encoders and decoders for which P(W„ 7^ W) 0? To 
demonstrate the existence of such encoders and decoders, a 
random coding argument and the laws of large numbers 
are typically invoked. 

Recently, a development by Shayevitz & Feder ifTSl . lfT4l . 
Iil5] . has re-visited a philosophically different way to frame 
the feedback communication model - dating back to the 1960s 
ED, IITtI . ifTSl - that has a more dynamical systems and control 
theoretic flavor Consider the following changes to the standard 
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Fig. 2. Communication of a message point W with causal feedback over a 
memoryless channel. 

information theoretic formulation that more closely resembles 
a causal coding/decoding problem, shown in Figure |2] 

• Message Point: Wi = W is equally likely over interval 
W= [0,1]. 

• Decoder: At each i (not only at time n), the decoder 
specifies Zi = Bi\i, the posterior belief about W given 
Yi,...,K,: B,\,iA)^¥{W,&A\Y% 

• Achievability: As shown in Fig [3] with a set of uniform 
quantizers (5^7^ : [0, 1] — > 2*^, i > l), a rate R is achiev- 
able if Bii, {{w : qiRiw) = QiRiW)}) 1. 
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Fig. 3. Representation of the posterior belief Bi in terms of its density. 



Note the importance of W being a continuous interval and Z 
being the space of beliefs on W, P (W), in order for this 'real- 
time' flavored problem to relate to traditional information- 
theoretic notions of achievability. The fundamental limits 
under both formulations are equivalent IJ5i , where achieving 
capacity subject to channel input cost 77(2;) constraints pertains 
to maximizing the mutual information I{W; F") ID. A time- 
invariant 'posterior matching' encoding scheme in Figure |2]s 
framework achieves capacity on general memory less channels 
ifTSl . Moreover, it is an optimal solution to a stochastic control 
problem lfT9l whose cost function at each time step is related 
to the sequential information gain I{W;Yi\Y''^^): 



I{W;Y'') 



i=l 



I(W;Y,\Y'-')^J2^ 



1=1 



log 



dB, 



{W) 



Note that the sequential information gain term represents the 
reduction in W's uncertainty from the previous posterior belief 
Bi_i\i_i to the current, and so each term in the sum operates 
on W, B^i, and This alludes to a generalization 

of causal coding/decoding problems with a cost function 
g{wi,Xi, Zi-i, Zi), which in this case could plausibly be 



g[Wi,Xi,Zi_i 



Zi) 



log- 



dzi 



-{wi) + ari{xi), 



(2) 



where Zi ^ 7. = V {\N) is a decision variable that can be 
any belief about the message. In this manuscript, we plan to 
build on this example and formulate general problems that 
capture this generalization and further elucidate an interplay 
between information theory and control theory within the 
context of both designing optimal strategies and performing 
inverse optimal control to characterize cost functions for which 
fixed strategies are optimal. 

B. Related Work 

The interplay between information and control theory has 
been established when treating a 'message point' as a real- 
valued point within the context of control over noisy channels 
E3, lEl, ES, ES, El, El, Ei and feedback information 
theory d. El, Qll, O, lEl, ED- 

Cost functions pertaining to log likelihood ratios and mak- 
ing decision variables pertaining to beliefs have been used 
within the context of sequential prediction 1281 . ||29l and signal 
compression/classification 1301 , relating thermodynamics and 
information theory with inference on hidden Markov models 
ISTl . I32I . linearly solvable Markov decision problems ||33]| . 
and feedback information theory IH, |[T9], lES, ll36l . 

Causal coding-decoding problems akin to Figure [T| have 
been studied extensively when is a Markov process, as it 



typically enables the use of dynamic programming to demon- 
strate the existence of optimal strategies where agents use 
posterior beliefs as state variables. In the case of all-discrete 
alphabets, this was demonstrated in l37l . l38l . In the case of 
all-real alphabets, W a Gauss-Markov process, and Py\x ™ 
additive Gaussian channel, this was demonstrated in ||39ll20l 
Ch. 6] where additionally an explicit optimal scheme consist- 
ing of 'innovations-encoding' and 'minimum-mean-squared 
error decoding' strategy was constructed, f.36.1 considered the 
case of W discrete and the objective to maximize mutual 
information, but Zi was not a decision variable. The case of 
general alphabets W was considered in BOl . BTl . but the 
purpose was quantization and thus Z was discrete and the 
cost function balanced squared error distortion and quantizer 
output entropy rate. Note that the information gain scenario 
in Section II-AI does not fall within any of the aforementioned 
works. 

Control-theoretic approaches to inverse optimal control in- 
volving a single-agent system have been developed classically 
for the case of a known policy JS], l42l where a control- 
Lyapunov function acts as an optimal value function and 
imposes constraints on candidate cost functions. Inverse rein- 
forcement learning additionally requires inferring the single- 
agent's policy based on experimental data [431, l44l . B31 . and 
it has been appUed to solve extremely challenging engineering 
problems l46l and within the context of neural ||6], Q and 
cognitive fH science. 

Information-theoretic approaches to inverse optimal con- 
trol for a two-agent (encoder/decoder) system relate costs 
to likelihood ratios through the variational equations for the 
rate-distortion and capacity-cost function BTl . BSl . but the 
problem formulations do not consider cost functions of the 
form g{wi,Xi, Zi-i, Zi) and either consider encoder/decoder 
interactions l49l , IH that do not have dynamics akin to the 
feedback loop and random process input in Figure [T] or have 
very specific statistical assumptions (e.g. the Gauss-Markov 
source and additive Gaussian channel ll39ll20l Ch. 6]). 



C. Paper Outline and Main Results 

We now outline the paper, where in each section we provide 
bullet points about how it differs from other formulations and 
its main results. 

Section provides mathematical notation and definitions 
that will be used throughout the manuscript. 

Section IIIII provides the problem setup. We emphasize 
the following properties that make it differ from traditional 
approaches: 

• the Markov process source has a general alphabet W 

• the traditional cost function g{wi, Zi) is replaced by 



g{wi,Xi, Zi-i, Zi) = p{wi, Zi-i, Zj) + m-i{xi) 



(3) 



• decision variables lie in arbitrary spaces X and Z 
Section IIVI considers a fixed cost function ^ and finding 

optimal coordination strategies {e,d). Results include: 

• a structural result demonstrating the existence of optimal 
coordination strategies operating on sufficient statistics, 
capturing traditional results IIJTI as a special case. 
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Section |V] considers the sequential information gain cost 
function (|2]i with Z = V (W) and finding optimal coordination 
strategies. Resuhs include: 

• an optimal coordination strategy always specifies Zi = 

• a characterization of the problem as cost-penalized max- 
imization of mutual information I{W^; Y") 

The first result uses dynamic programming and the second 
law of thermodynamics for Markov chains ll50l . It synergizes 
with work in ifSTl but differs in how this is cast in the causal 
coding/decoding framework and the information gain cost 

Section IVII considers the inverse optimal control scenario 
with cost functions of the form (O. Results demonstrate that 
a 'stationary Markov' coordination strategy (e, d) is inverse 
optimal when the channel outputs Yi, . . . ,¥„ are statistically 
independent. The technique constructs the induced p and 
r] from the variational equations to the rate-distortion and 
capacity-cost functions and can be interpreted as a 'source- 
channel matching' B9l . lH generalization applicable to the 
causal coding/decoding problem with cost function (|3]l. It is 
also shown how in some situations, this sufficient condition 
reduces to time-reversibility of a Markov chain, thus further 
demonstrating a relationship between thermodynamics and 
information theory that has been developed in lISTI . Il32l . 

Section IVIII provides example problems for which the 
aforementioned results apply, and shows how: 

• under a particular constraint, the hidden Markov model 
and nonlinear filter II52I are an optimal coordination 
strategy for the information gain cost (|2) with W = X 

• the posterior matching scheme fl5\ is an optimal coor- 
dination strategy for the information gain cost ([2) and 
source model Wi = Wi-i with W = [0, 1] 

• the structural results aid the design of optimal and 'user- 
friendly' coordination strategies for brain-machine inter- 
faces lEa 

• inverse control optimal Markov coordination policies with 
cost ^ exist for: 

- Gauss-Markov source, AGN channel pair 

- Markov counting-function source, Z channel pair 

- Markov counting-function source, 'inverted £" chan- 
nel pair 

The first example is related to the variational characterization 
of the optimality of the nonlinear filter ||3T| , but is different 
due to the information gain cost (|2]i. The second example 
generalizes the result of llT9l because here, Zi is a decision 
variable. In the inverse optimal control examples, the Gaussian 
channel case pertains to the decentralized control problems in 
II20I Ch 61, ll39l with quadratic state cost and squared error 
distortion; the Z channel case pertains to the -/M/l queue for 
timing channels [541, 155j. the 'inverted £" channel pertains to 
Blackwell's trapdoor communication channel ll56l . Il57l . Il58l . 

Section IVIIII provides a discussion and conclusion, fol- 
lowed by references and an appendix of proofs. 

II. Definitions and Notations 



For a sequence ai, 02, . . ., denote aj as {ui, . . . , aj) and 
a-' = a|. 

Denote the probability space with sample space fi, sigma- 
algebra and probability measure P as (17, J^, P). 
For a given (O, and a Borel space (V, B (V)), denote 
any measurable function X : — > V as a random object. 
If V = K, then X is termed a random variable. 
Upper-case letters V represent random objects and low- 
ercase letters u G V represent their realizations. 
For two probability measures P and Q on {ft, F), we say 
that P is absolutely continuous with respect to Q (denoted 
by P < Q) if Q( A) = impHes P( A) = for all A e J". 
If P ^ Q, denote the Radon-Nikodym derivative as any 

J that satisfies 



random variable ^ : VL 



f{A) 



-{oj)Q{du 



• Denote P (V) as the space of probability measures on 
(V, B (V)). For any random object y : SI — s> V, denote 

Pv{A) ^P{V eA)^¥{{u: V{uj) e A}) , AeB (V) 

• Denote the conditional probability distribution of one 
random object V given that another U takes on u as 

Pviu=u{A)^F{V <eA\U = u), AeB{y). 

Markov Cliains Notation: 

• A random process y = (Vi : i > 1) is a Markov chain if 

Pv^^,\v^^AA)=Pv^^,\v.=vSA). Aei3(V). (4) 

It is time-homogenous if Py.^^iy.^^,. (A) = Q{A\vi). 

• A Markov chain is time-reversible if the forward and 
reverse time processes are statistically indistinguishable: 

d 



[Vj : 1 < J < n) 



n-j+l 



1<J< n) (5) 



where = denotes equivalence in distribution. 
Information Tlieoretic Notation: 

• Given two probability measures P,Q E P (V), define the 
Kullback-Leibler divergence as 



DiPWQ) 



'ly^ogjg{v)Pvidv), 



N ^"6 dQ > 
-CXD, 



if P<Q 
otherwise 



(6) 



Given two sets of conditional distributions 

{Pv\u=mPv\u^u £ (V) : u e U) and a distribution 
Pjj G P (U), define the conditional divergence as 

D {Pv\u\\Pviu\Pu)= [ D {Pv\u=u\\Pviu=u) Puidu) (7) 

J lA 

Consider a set of conditional distributions (Py| 
P{\/) : u e U) and a distribution Pu G PiU). This 
induces a marginal distribution Py G P (V). The mutual 
information is given by 

I{Pviu.Pu) = I{V- U)^D {Pv\u\\Pv\Pu) ■ (8) 

U and V are independent if and only if I{V: U) = 0. 
The conditional mutual information is given by 



Probabilistic Notation 



I{W-Y2\Yi) = D(P, 



'W\Yi,Y2\\Pw\Yi\i^Yt,Y2 



Py 



(9) 



5 



The chain rule for mutual information is given by 

n 

I{W;Y^) = Y.I{W;Y,\Y'-^). (10) 

n 
i=l 

Consider a memoryless channel Py\x ~ {Qy\x=x £ 
V (Y) : X € X), a cost function rj : X —t- M+, and an 
upper bound L G R+. Define the capacity-cost function 
as C (t], Py\Xi L) 1591 and its maximizing distribution 

P*x{v.Py\x.L) as: 



= argmax /(Fx, ^V|a-)(1 1) 

PxeV{X)s.tM[ri(X)]<L 



P*x{ti,Py\x,L 
C{ij,Py\x.L) ^ /(Pi(77,Py|x,L),Py|x).(12) 



III. Problem Setup 

Throughout this discussion, we consider 4 random processes 
W, X, Y, Z associated with Borel metric spaces W, X, Y, Z that 
are coupled according to Figure [T] The natural time ordering 
for the causal construction of the four random objects through 
time is given by: 

■ • • , Zj-i^ Wj , Xj^ Yj , Zi , Wi+i, . . . 

ith epoch 

The input process: 

W is a time-homogenous Markov process such that for any 
A e Jw: 

Pw,-i-i\W'=w\X'=x\Y'=y'{A) = P;Vi+i|Wi=u>, (^)(13) 

= Qw{A\w,) (14) 

The causal encoder: 

The causal encoder at time i has causal information about the 
source, W^, and causal feedback about the channel outputs, 
Y^~^, to specify the next channel input, Xi, 



(15) 



We define the aspect of G that maps Wi to Xi as £ E 
where E is a space of Borel-measurable functions f -.W ^ X: 



ei[w 



-1 



e,(.). (16) 



and we define Ei to be the space of Borel-measurable functions 
/ : W X Y*-i ^ X such that e E for all w'-'^ and y'-'^. 
The memoryless non-anticipative channel: 

Xi G X is passed through a time-homogenous, non- 
anticipative, memoryless channel to produce Yi G Y; for any 

A G Jy: 



PYi\Yi-^=yi-^,X"=x",W^=w"{A) — PY\x{A\Xi 



(17) 



The causal decoder: 

Lastly, the causal decoder at time i uses causal channel 
outputs, to specify Zi G Z. Define as a space of Borel- 
measurable functions / : Y* — > Z and D = Di x . . . x D„. 



Then the causal decoder d G D is a sequence of functions 

d = (di : 1 < i < n): 



(18) 



Belief update: 

In the above discussion on the causal decoder, we deliberately 
consider Z to be general, not necessarily equal to W. Indeed, 
as we shall see, in some cases we set Z = 'P (W) so that 
the outputs of the causal decoder represent beliefs about the 
source at time i. Define the beliefs B^j G V (W) about the 
source at time i given the decoder's observations up until time 
j < i as, for any A G S (W): 



B^j{A) 



¥{W, G A\Y^) 

PWi\Yi=yi {A)., 



(19a) 
(19b) 



The beliefs can be interpreted as state variables that can be 
updated sequentially given new observations. The nonlinear 
filter A : V (W) x Y x E -> P (W) and one step prediction 
update $ : V (W) V (W) rules are given by 



A{b,y,e){dw) = 
^{b){dw) = 



dPY\x (-leH) 



dPA[-\b, e) 

Qw{dw\w')b{dw') (21) 

PA{dy\b,e) ^ f PYixidy\eiw')Mb){dw'){22) 

Jw'<E\N 

(I20I 1 can be interpreted as a standard manifestation of Bayes' 
rule: the numerator is simply the likelihood, the denominator 
is a normalization constant, and the coefficient $(6) is simply 
the prior. The aforementioned two equations specify how the 
beliefs are sequentially updated: 

Lemma III.l ( Il37l . ll52l ). For any i and encoder policy 
with associated Ci given by (I16l l. the following holds: 



= $ (b. 



bi\i = A (&i_i|i_i,2/j,ei) 



(23a) 
(23b) 



In Section |IV| we demonstrate using a structural result how 
the beliefs arise as sufficient statistics in our main problem. 
In Section [Vl we demonstrate how they additionally serve as 
optimal decision variables with information gain cost (|2]l. 

Additive cost function: 
Denote a coordination strategy, also termed policy, as tt = 
(ei, . . . , e„, di, . . . , dn) and the set of all feasible policies as 
n = {tt : e,; G Ei, di G D;} The causal encoder and decoder 
e and d are cooperating to achieve a common goal. The 
performance of their cooperation is measured in terms of an 
expected sum of costs over time horizon n with the following 
structure: 



7" — E 



Y,p{W,,Z,_^,Zi)+ai^{Xi) 



.1=1 



(24) 



The above expectation is taken with respect to an initial dis- 
tribution Pwo-Zo where Zo is assumed known to the encoder 
and decoder We assume that the functions p and rj along with 
constant a have the following structure: 
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• p:WxZxZ— > R-|_ is a 'distortion-like' source cost, 
that relates the distortion between the source at time i 
and the outputs in the vicinity of time i. 

• r] : X ^ K+ is a 'power-like' channel input cost that 
penalizes channel inputs that deviate significantly from 
nominal desired values 

• a E R+ balances the relative importance of the two costs. 

Definition III.2. We say that a sequential encoder-decoder 
pair TT* G n is (globally) optimal if 



Jl.' < Jn,. for all ^ e n. 

IV. Main Structural Results 



(25) 



In this section, we prove that - under mild technical as- 
sumptions - for a general class of cost functions [p, rj, a) 
inducing an average cost specified in ( l24l i. an optimal belief- 
based policy-estimator pair exists with the structure as shown 
in FigH 
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Fig. 4. Structural Result and Sufficient Statistics 

We first consider the basic solution approach to the problem 
by first demonstrating an example with two time steps. The 
essence of the idea is as follows: 

2 

min VE[p(W„Z,_i,Z,)+a77(X,)] 

ei .di .69 ,cl:> 

i—\ 



E 



E 



minE ar\(X\)\E\ 

ei 



controller at stage 



mill E 

e2,<il 



controller at stage 1 



E 



min E 

d2 



p{W2.ZuZ2) 



ZuY^,Z2 



controller at sta^e 2 



We next demonstrate that the conditional expectations in 
can be described in terms of these and other variables 
whose alphabets do not grow with i: 

Lemma IV.l. For a fixed policy tt ~ (e, d), define b' ~ hi^i G 
V {\N) as in il9i . and e^+i (w*, ?/')(•) = ei+i(-) E E as in 
il6i . Define the state space S = ZxV (W) and control space 
U = E X Z with Si € S,Ui € U given by 



(z,_i,6i|,) 



(27) 



Then 



p{W,, r^ Z,] = p{S^, Zi) (28a) 

77(X,+i)|Z,_i,y\^,+il ^n{S^,E,+l) (28b) 



To emphasize, this demonstrates state (s) and control (u) 
variables whose alphabets do not grow with i, for which 
'distortion' and 'cost' like functions solely operate on. The 
definitions of p and fj along with the lemma's proof can be 
found in Appendix |A] We now demonstrate that these state 
and control variables comprise a controlled Markov chain; 

Lemma IV.2. The state Si ~ {zi-i,bi\i) and control u = 
(ei-)-i,Zi) variables comprise a controlled Markov chain: 



(a) JZ, = E, 



gi{s,u) 



^9i{Si,Ui) 

.i=0 

pi-Si, Zi) + af]{si 

p{Si,Zi) 



i = 
l<i <n 
i — n 



,ui{ds,+i\s\u') 



i\S,.,U, 



{ds 



(29) 

- 1 
(30) 

Ui) 



Qs{dsi+i\si,Ui) 



The proof of (a) follows directly from the law of iterated 
expectation and the definition (|28] l. The proof of (b) can be 
found in Appendix iBl Now define the cost-to-go function at 
stage n — k as Vn-k : S ^ M. Then for Vn+i{s) = and 
k = 1, . . . , n define: 



Vn-k{s)= inf 



gn-k{s,u)+ / Vn-k+i{s')Qs{ds'\s,u) 



(31) 



(26a) 



Note that by grouping [di^Ci^i] in this manner, in stage i, 
di : Y* ^ Z and e^+i : W'+^ x Y* X have access 
to a common piece of information, (and thus also zi). 
Note that 6^+1(10% = ei+i(-) : W — >• X is a mapping 

as given by ( fTSI l, whose alphabet, E, does not grow with i. 
We secondly consider the belief bi\i, which is a function of 
(y*, ei, . . . , Ci) and whose alphabet, V {\N), does not grow 
with i. The 'control' action taken by the encoder (decoder) 
are given by (Zi) respectively. 



This allows for us to state our main theorem of this section: 

Theorem IV.3. If for each s G S, the infimum in ( 1311 ) is 
attained and the functions [Vk '■ fc = 0, . . . , n) are universally 
measurable, then there exists an optimal encoder/decoder 
policy (e*,d*) pair of the form 

e*+i(w'+\y') = e*+i(w^+i,z,_i,6,|i) (32a) 

d*{v') = (32b) 

Proof: Using standard dynamic programming arguments, 
13] Chapter 8] we have that J^^ ,,, > E[Vq{So)]. Next, J^^. = 
E[Vo(S'o)] and it can be implemented by a policy of the form 
( l32b by a policy that attains the infimum of ( |3TI ) for each s 
1% Prop 8.6]. ■ 
The structural result in graphical form is shown in Figure 5] 
Note that within the causal encoder, the first process is a filter 
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that computes sufficient statistics. From here, these sufficient 
statistics are given to another encoder, e^, that uses them, along 
with the current source value, Wi, to specify the next channel 
input Xi. Analogously, the causal decoder is comprised of 
first the same recursive filter that computes sufficient statistics, 
followed by another decoder, di, that computes Zi. 

We now note that 'universal measurability' O is usually 
satisfied: 

Remark 1. Standard technical assumptions guarantee uni- 
versal measurability and that the infimum is attained; one 
example is as follows: (a) W, X, Y, and Z are compact 
Borel metric spaces, (b) p and rj are lower semi-continuous, 
(c) PY\x{dy\x) and Qw idw\w') are continuous stochastic 
kernels, and (d) E is an equicontinuous space of functions. 

We also note our result generalizes the classical result of 
Walrand and Varaiya [[371 : 

Remark 2. This result instantiates the result in 113 7)1 which 
assumes all alphabets are finite, rj = 0, and p{wi, Zi-i, Zi) = 
p{'Wi,Zi): (i) because of the finite alphabets and costs, the 
infimum is attained in (13 1 b ; (ii) (127b can be collapsed to 
Si = hi\i because of the absence of Zi-i in the function p. 
Secondly, our proof technique differs from 113 7\ Sec. IV] in 
that we replace the three-step proof technique of ( l[37\ Thm 1, 
Lemma 1, Thm 2]) - which includes two DP arguments ( l[37\ 
Thm l,Thm 2]) - with a single DP argument. 

However, our emphasis is not solely on allowing general 
alphabets or using the cost function of a particular form 
- both of these have in essence been accomplished using 
state augmentation and dynamic programming over general 
spaces. Rather, our emphasis is to carefully augment standard 
formulations to uncover an interplay information theory and 
control theory problems, as we shall see in the next section. 

V. The Sequential Information Gain Cost 

In this section, we specifically consider a class of problems 
that are not covered in traditional causal coding/decoding 
frameworks ||37l. ||38l.ll39l.l|20l Ch. 61. 

Traditional problems consider cost functions of the form 
p{wi,Zi) and assume that either all alphabets are finite ll37l . 
lESl, or W = Z = M ||39l,||20l Ch. 6]. Motivated by the 
feedback communication example in Section H] A, we now 
assume that Z = V (W), the space of possible beliefs on 
the source. Secondly, we construct p{wi, z;) to be a log- 
likelihood ratio that is suggestive of an 'information gain'-like 
quantity. 

The following Lemma describes the relationship between 
KWn.y") and /(M/" ^ F") for our problem setup (HD- 
(fTsT l. Because there is no feedback loop from Y to the 
generative process of W, these two quantities are equivalent: 

Lemma V.l. For any 'sufficient statistic operating' encoder 
TT G n satisfying (I32ab . i.e. x^+i = ei+i(zi;i-|_i, 6j|j), the 
following holds: 

n 

I{W"; Y") = I{W" ^ r") === ^ I{Wf,Y,\Y'-^). 



The proof is in Appendix |C] Note that from the structural 
result in Theorem IIV.3I there is no loss in performance for 
restricting attention to encoders of the form (I32at . Under such 
encoders, note that the mutual information can be expressed 
as an accumulation of sequential information gains. 



/(VK";r") 



i=l 



IiWf,Y,\Y'-') 



(33a) 



^E[Z?(H,|,||i3,|,_i)] (33b) 



n 

El 

1=1 



log 



(33c) 



where ( |33ab follows from Lemma [VTl (|33b) follows from (|9) 
and ([T9]i; and (I33cb follows from ^ and (1211 1. 

One may consider finding encoder policies e in order to 
maximize /(W^";y"), using a state space approach over the 
space of beliefs. lfT9l formulated a stochastic control problem 
where Bi_m_i is a state variable and the only decision 
variable is the causal encoder's strategy - the decoder did 
not specify a decision variable Z^. There, it was shown 
that when W is uniformly distributed on W = [0,1] and 
{Wi = W : i > 1), the causal encoder given by the 
posterior matching scheme by Shayevitz and Feder (15\ is an 
optimal solution to a control problem where costs are related 
to conditional mutual informations ( I33bb . Anand and Kumar 
have recently considered a related problem where [l36l where 
{Wi i > 1) is a general Markov process over a finite alphabet, 
and the cost function is a conditional mutual information. 
There, also, however, the decoder did not specify a decision 
variable Zi. 

In this setting, we do not treat Bi\i as a state variable; 
rather, we first consider a problem in the framework of causal 
coding/decoding, where the decoder's decision variable Zi can 
be any possible belief: Z = V (W). In order to reward larger 
information gains, we define an appropriate cost pertaining 
to the negative logarithm of the Radon-Nikodym derivative 
evaluated at Wi that is inspired by the expansion of mutual 
information given in ( I33cb : 

p{wi,Zi^i,z.,) = i ^'P(^'-i) ^ (34) 

I oo, otherwise 

The reason we assign p = oo when Zi <C ^{zi^i) is because 
under any reasonable belief-setting strategy, if the belief about 
Wi given - the one-step prediction update (l2Tb given by 
- assigns zero probability mass to A £ B{\N), then 
the belief about Wi given y - which is given by Zi should 
also. 

We emphasize here that the beliefs on the source are 
themselves decision variables, which are what the causal 
decoder must specify. This viewpoint has been used within 
the sequential prediction literature ll28l and statistical signal 
processing 1301 but appears to not have been used as frequently 
in the literature that attempts to draw synergies between 
information theory and control. 

Define Zo{A) = P (Wq € A), the distribution on Wq. We 
now state the following useful Lemma that decomposes the 
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cost into the state and distortion parts, that act on different 
aspects of the control input: 

Lemma V.2. Under the information gain criterion ( |34t , for a 
state variable Si = (z, b) and control variable Ui = (e, z'), 



rj (e {w)) ^{b){dw) 



(35) 



wevJ 



D{b\\z')-Dibmz)) 6«z'«$(z) 
P(St,z)=< , . (36) 

I oo otherwise 

The proof can be found in Appendix |D] 

With this, we state the main theorem of our section. It 
says that when treating beliefs as decision variables, under 
the information gain criterion (|34t . the optimal decision rule 
for the decoder is to select its belief about Wi to be Zi — b^i, 
and the optimal decision rule for the encoder is to maximize 
mutual information subject to a cost on channel inputs: 



(37) 
(38) 



Theorem V.3. Under cost criterion ( |36l l. there exists an 
optimal encoder/decoder policy {e*,d*) pair of the form 



i\i) 



where b^i = A(6i_i|i_i, y,, e*(-, 6i„i|i_i)) and the optimal 
cost is given by 



Wi 



causal 
encoder 



X, 



Noisy 
channel 




nonlinear 
filter 







nonlinear 
filter 



delay 



■ (39) 



Fig. 5. Simplified structural result with Z = 
information gain cost | |36I . 

The proof can be found in Appendix [ 



V (W) and sequential 



Remark 3. The proof of Theorem \\/.3\ {Appendix |£| uses 
dynamic programming, the second law of thermodynamics 
for Markov chains, and exploits how the divergence acts as 
a Lyapunov function for the stability of the nonlinear filter 
This further demonstrates an interesting relationship between 
information theory and thermodynamics l[31]l , 4i2|/ . This idea 
of using beliefs as decision variables where the posterior belief 
is optimal has been used in sequential prediction l[28]l and 
in variational approaches to nonlinear estimation 113 IV , but 
within the constext of causal coding decoding problems, this 
is to the best of our knowledge, new. 

We will demonstrate in the examples section how this relates 
to the hidden Markov model and the nonlinear filter as well 
as the Posterior Matching Scheme jlSl for communication of 
a message point over a noisy channel with feedback. 



VI. Inverse Optimal Control with Stationary Markov 
Coordination Strategies 

In the last section, we demonstrated that for a spe- 
cific "information-gain" related cost function (l34t . there 
existed an optimal encoder policy of the form Xi = 
e*{Wi,Zi-i) and decoder policy of the form Zi = = 

A(s,_i|,_i,y„e*(-,s,_i|,_i)) = J(z,_i,yO- 

In light of this, we now consider a general Markov process 
W e \N and Z e Z where Z need not be V (W) and fix 
the coordination strategy tt to be stationary Markov (SM), 
meaning that for fixed functions e : W x Z ^> X and d : 
Z X Y Z, the following holds: 



Xi 
Zi 



e(«7i,Zi_i) 
J(zi_i,yj). 



(40a) 
(40b) 



See Figure |6] (I40bb is sometimes termed a decoder of 'finite- 
memory' l37H60i . Since the encoder and decoder both utilize 
this can be also interpreted as a collection of 'equi- 
memory' encoders and decoders ||20] Definition 6.3.2]. For 



''causoT encb'dePi 



\causal c[ecd'der\ 



■X, 



Noisy 
channel 



'Yi. 



delay 



delay 



Fig. 6. A Stationary Markov coordination strategy 

a fixed SM coordination strategy tt, we compare it against 
arbitrary policies of the form tt = (ei, . . . , e„, di, . . . , 
where : W x Y'^^ X is given in ( fTSl l. and : Y' — > Z is 
given in ( fTSl ). Here we identify the structure of cost functions 
p{wi, Zi-i, Zi) under which tt is globally optimal. 

Definition VI. 1. A coordination strategy tt is inverse-control 
optimal for a source-channel pair {Pw" i Py\x) if J" tt — 
J" ^, for all tt' eU for some a>0, p: \NxZxZ^R and 
1] : X R+ in 

To develop high-level conditions under which tt is indeed 
inverse-control optimal, we first develop some preliminary 
machinery. Fix a specific p and rj function. For any coordina- 
tion strategy tt, define -P^^iiV" conditional distribution 
induced statistical law under tt and also define: 



1_ 



n 



n 



(41) 



(42) 



Define the rate-distortion function for Pw 

Rn {p, Pw" , D)^ min -/(Piy- 

Pz"|w":E[Jri:r=iP(W''i,^i-i.2.)]<-D n 



and p as 121 



"X43) 
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and denote ^'^"IW" (P^ Pw",D) as the minimizer in ( |43T l. We Theorem VI.5. If under a SM policy tt, {Yi : 1 < i < n) are 



now state the following standard lemma: 

Lemma VI.2. Fix a 1:,?^^, Py\x> P> ^'''d tj. Then 



< 



-I (Pw^,Py'^\W" 

< li^pi^Pvix 

i=l 



(44a) 
(44b) 

(44c) 
(44d) 



where equality holds if and only if: 



(b) I{Pw^^,P^„^^; 



— 1 1 Pw" J Py"\w" 



(c) I{Yi;Y'-^) = 0/or each i 
• ('^) Px, = Pxi'n^ Py\x,L^) for each i 

The proof is standard |!2l but for the sake of completeness, 
we include it in Appendix |F] This leads to an intermediate 
sufficient condition for inverse control optimality that applies 
for any tt G 11 (e.g. tt need not be stationary-Markov): 

Lemma VI.3. If a policy tt G n results in (1441 holding with 
equality, then it is inverse control optimal. 

Proof: Note that J^^ = DT,+aLT, = {{I, a), (1)^,^^)). 
Define U = {{D^>,L^' : tt') : tt' e H}. Define U to be the 
set of randomized policies in 11. Note that any tt' G 11 still 
induces a conditional distribution P^n ^^,■r^ and thus an induced 
Dt^i and Ltt' so that we may define TZ — {{D-^/jLtt' : tt') : 
tt' G n}. Clearly, TZ C TZ, and secondly, TZ is convex. Next, 
note that if (l44l l holds with equality for some tt G 11, then 
P'tt < Dtt' for for any tt' G 11 for which L^' < Lt^ from the 
definition of i?„(p, Pw" , Dn) in (l43] l and C [rj, Py\XiL-k) in 
(fTTl i (See also IH Lemma 1]). Thus {D-^jLj^) is a boundary 
point of TZ. Therefore there exists a supporting hyperplane 
parametrized by a > that intersects {D^jLt^)'. 

Jl, = {{l,a),{D,,L,)) < {{l,a),{D,,,L„,)) = J"^,,. 

for all tt' G n D n |f6ll . where (•, •} denotes inner product. 

■ 

We now consider SM policies for which condition (c) in 
Lemma IVI.2I holds, and demonstrate a stationary Markov 
relationships between (Vt^" , Z") random variables: 

Lemma VI.4. If an SM coordination strategy tt (I40l l induces 
the channel outputs (Yi : 1 < i < n) being i.i.d., then 

Pz,\z^-^=z'-\W"=wr^idzz) = Qz>\z,w'idzr\zr-i,w,) (45a) 
Pliz^-i=.^~iidz,) = Ql,\zidzM-i) (45b) 

The proof of this can be found in Appendix |G] and exploits 
the equivalence between a random process being a time- 
homogeneous Markov chain and it being represented as an 
iterated function system ll62l . With this, we can now state the 
main theorem of this section: 



i.i.d. and I { Pw" , P? 



Z"\W' 



inverse control optimal with p and rj given by 



)=I (Pw- 
n p and ri si\ 



then TT is 



■q{x) 0C+ D {Py\x= 
p{wi,Zi^i,Zi) 0C+ - log- 



\P^) 



dQz'\z,vi^'i-\zi~i,m) 



(46a) 
(46b) 



dQ%,\zi-\zt-i) 

where (x+ denotes proportional to with a positive constant. 

Proof: Note that it suffices to show that ( l44l i holds with 
equality and then invoke Lemma [VO] First note that from the 
theorem definition, clearly conditions (b) in Lemma rVI.2| holds 
with equality. Since (Yi : i ^ 1, . . . , n) are i.i.d. and since the 
channel is memoryless, it follows that the {Xi : i = 1, . . . ,n) 
are identically distributed and so condition (c) in Lemma |VL2] 
holds with equality. Thus the two remaining conditions are to 
show that conditions (a) and (d) in Lemma IVI.2I hold with 
equality. 

The variational equations for an optimal solution to (l43T l 
state that a necessary and sufficient condition for J'^-iiV" ~ 



n {p, Pw" , Dtt) is the following relationship PTI : 



dP 



Z^\W"=w" 



dPz^ 

For our case, note that 



(z") = Ciw^e 



log- 



o-^z^\w^ 



dp^„ 



-iz-)^J2^og 



dP'^ 

"-^Z,\Z'- 



)) 



(47) 



dP^ 

i=v "'^Zi\Z- 

" dQ^z'\z.w 
= > log 



(•|z,_i, W,;) 



1=1 



dQ%,\zi-\zt-i) 



where ( |48T l follows from Lemma IVI.4I Thus we see that with 
p given by ( |46bt . from (|47] i we see that condition (a) of 
Lemma l44l holds with equality. 

Lastly, condition (d) of Lemma|44]holds with equality if and 
only if each PJ. ^ P'^{rj,PY\xiL-n)- Variational arguments 
(|4] Lemma 31. l|48l p. 147] demonstrate that this criterion is 
equivalent to ( 146 a| ). ■ 



Corollary VI.6. if the function c?(zi_i, •) = in ( |40b| 

is invertible, then condition ( |46b| i in Theorem WI.5\ becomes 



p{Wi,Zi_l,Zi) 0C+ log- 



dp^ 



Y\X=e(wi,Zi-i) 

~dP^. 



dzti^z^) 



We now first relate this to 'source-channel' matching and 
how it is in some sense it is also 'natural' within the inverse 
control framework to have a distortion function of the form 

p{wi, Zj_i, Zj): 

Remark 4. The problem setup leading up to Theorem WI.5\ 
is philosophically inspired by the 'source channel matching' 
work l[49]l - but here, we are relating this to a causal 
coding-decoding problem with causal encoder feedback, and 
time-invariant additive costs. These two properties appear 
to make the distortion function p^Wi, Zi-i, Zi) - as compared 
to p{'Wi,Zi) - crucially important: note the time-invariant 
statistical relationships in Lemma [VI.4\ and how they relate to 
p{wi, Zi-i, Zi) in (I48l l and Corollary I VI. 61 pertainins to condi- 
tion (a) in Lemma [VI.2\ With this more general p{wi, Zi-i, Zi) 
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framework, we can characterize time-invariant cost functions 
for problems where neither W nor Z are stationary (see the 
linear quadratic Gaussian decentralized control and M/M/1 
queue examples in Section Wllh 

Next, we demonstrate how time-reversibility of an appropri- 
ately defined Markov chain can serve for Theorem IVI. 51 - and 
thus inverse control optimality - to hold for SM coordination 
strategies. 

A. Time-Reversibility of Markov Chains and Inverse Optimal 
Control 

Time reversibility plays an important role in disciplines 
concerning dynamical systems, e.g. in physics (conservation 
laws); statistical mechanics (in terms of equilibrium states); 
stochastic processes (e.g. queuing networks ||63l . ||64l and 
convergence rates of Markov chains ||65] ch 20]); and biology 
(e.g. trans paths in ion channels ll66l ). However, its use 
in acting as a sufficient condition to saturate fundamental 
information-theoretic limits appears to be somewhat limited. 
One special noteworthy exception is how Mitter and colleagues 
have related Markov chain reversibility to rate of entropy pro- 
duction in non-equiUbrium thermodynamics Il5n . ll32l Remark 
2.1]. 

In queuing systems, the celebrated Burke's theorem 11641 . 
||67l uses Markov chain time reversibility to show that, in 
a certain stochastic dynamical system - an M/M/1 queue in 
steady-state - the state of the system (queue) at time i is 
statistically independent of all outputs (departures) before time 
i. This observation has been used in proving achievability 
theorems using for queuing timing channels ll54l . Il55l . Il68l . 
and for implementing recursive schemes that maximize mutual 
information according to the converse to the channel cod- 
ing theorem with feedback |[l6l, lUll, ESI, ES]. Here we 
demonstrate how time reversibility of Markov chains provides 
a sufficient condition for inverse optimal control with SM 
coordination strategies. 

We first note that from (fT4t . is a time-homogenous 
Markov chain and so it can be represented as an iterated 
function system 1621 : 

W, = ij{m,W,^i) = ij^^{W,.i), i>\ (49) 

where Wi are i.i.d. To ensure, (fTlT l, we assume 

I{Wi]X'-\Y'-^) ={]. (50) 

We next suppose the structure of the SM coordination 
strategy is such that the following assumption holds 

Definition VI.7. We say that the SM coordination strat- 
egy TT = (e, d) elicits ' reversibly feasible dynamics ' if 

I^„j and the statistical 

dynamics can be described as 

= f{X,^,,m) = f^^_^(W,) (51) 

X, = g{X,,Y,) = gxAY^) (52) 

where fj^ ^ : W — > X and gxi : Y X are W-a.s. invertible 
functions for j = 1, . . . , n. 



Note that x in condition ( |52] i is the update to the state after 
the output of the channel is taken into consideration and before 
the source w is updated to the state. 

We now show an example that is related to feedback 
communication with posterior matching ifTSi 

Example 1. Lef W = X = [0, 1] and Z = V (W). Then the 



'posterior matching ' scheme dlSJ given by 

Wo ^ unifO,\\, W^ = \,i>l (53a) 

Wo = Wo = W^-u « > 1 (53b) 

Z, = (53c) 

Xo = 0, X, =X,_i,i > 1 (53d) 



X, = Z,-i{%Wr]) ^ Fx\Y{X^\Yi) (53e) 

elicits reversibly feasible dynamics . Note that this clearly 
is a SM coordination policy because for the decoder is 
given by the nonlinear filter, and for the encoder, this follows 
from the first equality in ( I53el ). To verify that the last equality 
in ( I53eb holds, see M5\ Corollary 6]. 

Next, we consider a SM coordination strategy that results in 
a birth-death Markov chain 1631 . 1641 where X can increase or 
decrease by at most 1 from time i to time i + 1 (see Figure [T): 

Example 2. Lef W = W = X = Y = Z = F /or some field. 



Then the following SM coordination strategy 

W, = W,_i+Wi (54a) 

Z, = Z,^i+Y, (54b) 

X, = X,-Y, (54c) 

X, = W, - Z,^i = X,_i + W, (54d) 



elicits reversibly feasible dynamics . This follows from inspec- 
tion. 

See Figure |7] 




Fig. 7. A birth-death iVIarkov chain X. 

Lemma VI.8. Consider an SM coordination strategy 
with dynamics given by dSli where P {w^ S {0, 1}) = 
P (l^i G {0, 1}). If X is a time-reversible Markov chain, then 
(X, X) is jointly a time-reversible Markov chain, Yi are i.i.d., 
and TT is inverse-control optimal. 

The proof that (X, X) is jointly stationary and that Yi 
are i.i.d. is a generalization ||69l of the discrete-time Burke's 
theorem f67T| from queuing theory. From there, the Lemma 
follows by simply invoking Definition ! VI. H and Theorem lVI.5l 
As of now, time-reversibility is only wed to inverse control 
optimaUty in the algebraic setup of Example |2] This alludes 
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to there being a more general statement under which time- 
reversibihty impHes inverse optimaUty: 

Lemma VI.9. If an SM coordination strategy tt elicits re- 
versibly feasible dynamics and {X, X) is jointly a time- 
reversible Markov chain, then n is inverse-control optimal. 

Proof: We now develop a generalization to the discrete- 
time proof of Burke's theorem [|67|, (691 from queuing theory. 
Note that from Assumption IVI.7I that 



(55) 

Y, = g],]{X,) (56) 
Now note from the time-reversibility assumption, we have that 
Xi,Xi, . . . , Xi^i, Xij~(^X2i~i, X2i-2, ■ ■ ■ , Xi, Xi^ (57) 
Re-arranging terms, we have 



Xi,x'-\x'-^ 



^ (58) 

= (59) 



=>I{X,;X'-\X'-') 
^I{X,-Y'-\X'-^) - I{X,-Xf-^,Wfl-^^) (60) 

where ( |60l l follows from invariance of mutual information 
to bijective transformations and (l5Tl i-(l52]i. (l55T l- (l56] i: Analo- 
gously, from (l58T l, 

= I{X,-X^l-^\Xf-^) (61) 
^I(X,-X'-^\Y'-^) = I[X,-W^l-^^\Xf-^) (62) 
Therefore 

/(X,;F*-i) = I{X,-Wfl-^) (63) 
= (64) 

where (|63T l follows from the chain rule of mutual information 
([Tol l and subtracting (|62] i from (|60t ; and (|64i follows from 
(|50] |. Because of the nature of the memory less channel Py\x 
in ([nil, it follows that I{Y^]Y^~^) = for all i. Moreover, 
because Markov chain reversibility implies stationarity, it 
follows that {Yi : 1 < i < n) are i.i.d. Thus we can invoke 
Theorem |VL5] ■ 

B. Information Gain Cost and Inverse Optimal Control 

In the beginning of this section, we motivated the definition 
of stationary Markov coordination strategies, of the type Xi = 
ei{wi, Zi-^i) and Zi = di{zi-i,yi) by noting from Section IV] 
that such an optimal decoder exists when Z ^ P (W) and 
the cost function is of the "information-gain" related structure 
01: 

Zi = = A(i?j_i|j_i,yj,e*(-,i?j_i|j_i)) = 

We now demonstrate that the information gain cost function 
in Section |V] can be seen to be a consequence of our inverse 
optimal control framework for any coordination strategy for 
(Yi : 1 < i < n) are i.i.d. and d is the nonlinear filter: 

Lemma VI.IO. Let Z = P (W). If a SM coordination strategy 
TT contains a nonlinear filter decoder Zi — d(zi^i,yi) = 



A(zi_i, j/i, e(-, and (Yi : 1 < n) are i.i.d., then 7f 

is inverse control optimal with information gain distortion 
p{wi, Zi-i, Zi) = — log 35^^— and state cost function 
r]{x) = D {^Py\x=x\\Py)- optimal cost is given by 

Jl^ = (a - l)nC {v, Py\x,L^) ■ (65) 



Bi\,. Now 



Proof: First note that under this policy tt, Zi 
note that clearly 

P {W^ e A\Y') = Z,{A) = P (W, G A\Z,^i,Z,) , (66a) 
P [W, e A\Y'-^) = = P (W, G A|Z,_iX66b) 

As such, we have that. 



dzi 



dP^ 
dP* 



-Zi — i ,Zi 



dp 



Zi\Z, 



W^\Z^-l=Zi-l 

Zi-i,Wi—Wi 



"■^Z,\Z,.i=z,.i 

dQz'\zwi-\^i-i''^^) 



(67) 
(68) 



dQl,^^{-\z,^i) 



[zi) {zi,) (69) 



where (|67| i follows from 
application of Bayes' rule 



follows from a simple 

\B,C) _ nC\A,B) . . 
V{A\B) ~ V(C\B) ' 

follows from Lemma |Vl.4| Also, since Zi ~ it follows 
that = Thus Theorem [VO] applies 

and so tt is inverse control optimal. To characterize the final 
cost, note that for the associated a. 



^ n,Tr 



-/(W'";y") + Q'Eg 



.4 = 1 



-nC (?7,Py|x,L) +aEg 



(70) 
(71) 



= -nC (?7, Py\x.L) +a\Y. I{X,-Y,) (72) 



(a-l)nC (77,Py|x,L) 



(73) 



where (fTOl i follows from Theorem IV. 3 1 ( ItH follows from 
the fact that Theorem IVI.5I applies which means that (l44l i 
holds with equality; ( l72b follows from the definition of mutual 
information and that i]{x) = D (^Py\x=x\\Py)'^ ^"d dTSb 
follows from the fact that ( l44l i holds with equality. ■ 
Traditionally, inverse optimal control is performed through 
finding a control-Lyapunov function ( |40] |. which involves per- 
forming a sequential decomposition of the problem and finding 
a consistent value function ( |3TI ). When Z^P (W), this can be 
done using only the decision variables and stationary-Markov 
coordination strategies as in Section fVl Xi = ei{Wi, Zi-i) 
and Z, = B,\., = A F^, ei(-, = 

d{Zi-i,Yi). This means that using a control-Lyapunov ap- 
proach, first a sequential decomposition resting upon the 
structural result work in Section |IV] would be needed, with 
the additional effort of showing that coordination strategies of 
the structural result form d32l can be reduced to stationary 
Markov strategies of the form (l40t . However, our inverse 
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control optimality sufficient conditions apply for a general Z 
(which need not be V (Z)) and do not involve a sequential de- 
composition. As such, the approach developed in this section - 
when applicable - appears to require 'less effort' than typically 
required in arriving at an inverse optimal control result. 

VII. Examples 

In this section, we provide examples of the Theorems and 
Lemmas from previous sections. 

A. Likelihood Ratio Cost and Information Gain: HMMs 
and the Nonlinear Filter 

We now demonstrate that the information gain cost frame- 
work of Section |V] demonstrates the causal coding/decoding 
and information-theoretic optimality of the nonlinear filter in a 
specific sense. Related work on using variational principles to 
characterize the nonlinear filter was reported in [ISTI . However, 
demonstrating that the nonlinear filter is acting as an optimal 
controller with respect to this information gain cost function, 
is - to the best of our knowledge - new. We start by considering 
the following assumptions: 

(i) the source and channel inputs have the same alphabets; 
W = X 

(ii) the causal encoder alphabet E; = {e^ : W x Y*^^ 
X} = {=} where = is the identity function: Xi = Wi. 

Under these conditions, the only feasible encoder simply 
specifies Wi as the channel inputs, and thus this becomes a 
hidden Markov model. 









PY\y 




nonlinear 












filter 





Fig. 8. Tlie information gain cost wlien tlie encoder set consists of only tiie 
identity function. Tliis becomes a liidden Marltov model where the nonhnear 
filter is an optimal solution. 



As such, we can consider maximizing the mutual infor- 
mation from to y" over all possible causal decoder 
policies. As such, the optimal design of disappears and the 
focus becomes optimal design of {di}. We now show that, 
assuming Zo{A) = P(Wo G A), the optimal policy for the 
decoder is given by the true posterior - which can be computed 
recursively using the nonlinear filter; 

Lemma VII.l. Under assumptions ( i) and ( ii) above, and cost 
functions 77 = and p given by i34i 



— Ice 



00, 



d<S> 



otherwise 



the policy tt consisting of the identity function encoder and 
nonlinear filter decoder Zi = = A(Zi_i, K;, =), is 

globally optimal where J,^^. = -/(VK";^")- 

Proof: Because is a singleton consisting of the identity 
function, and because rj{x) — 0, this follows directly from 
Theorem IV.3I ■ 



B. Likelihood Ratio Cost and Information Gain: Feedback 
Communication of a Message Point 

Given that the natural mathematical framework to han- 
dle feedback is control theory, we consider the problem 
of communication over noisy channels with feedback from 
the dynamical systems perspective, and make use of recent 
sequential approaches to communication. This viewpoint has 
been made largely possible by a recent development in the 
information theory literature - the posterior matching (PM) 
scheme ifTSll - which generalizes other 'message-point' style 
feedback communication schemes ifTTl . ifTSl . lfT6ll : rather than 
nR bits, a message point on the interval [0, 1] is considered. 
The notion of "decoding nR bits" now becomes equivalent 
to determining the message point within an interval of length 
2-n-R jjjg receiver (see Section II-Ab . 

The implementational details and fundamental limits are 
completely in line with traditional communication paradigms 
(see IIT5I ) but there are subtle, yet striking differences. Because 
the message point is a point on the [0, 1] line, there is no pre- 
specified block length; the system operates to sequentially give 
the user the information that is "still missing" at the receiver 
Moreover, at each time step, the decoder specifies an output 
Zi d V {\N), which is a belief about the message point. We 
now demonstrate how this notion of communication, and the 
problem of finding the optimal encoder with feedback, can be 
captured with our framework. Moreover, we will demonstrate 
that the PM scheme is an optimal solution to the problem. 

Let W = [0,1] and Z = P(W). Further, let the source 
process be the 'repetition' Markov process {Wi ^ W : i > 1) 
with W uniformly distributed over [0, 1]. If we assume that 
there is an expected cost constraint i X]"=i ^ ivi-^i)] ^ L, 
then we may formulate a communication problem of com- 
municating a message point over a memoryless channel with 
causal feedback. First note that the mutual information be- 
tween the message point and observations is given by 



1 



n 

I{W-Y^) = -Y I{W-Y,\Y'-^) 

n ^ — ^ 



log 



dB, 



i\i-l 



-iW) 



Shannon's converse to the channel coding theorem with feed- 
back tells us that in order to achieve capacity, this afore- 
mentioned quantity must asymptotically be maximized. This 
allows for us to consider the following maximization problem 



where a serves as a Lagrange multiplier such that under an 
optimal policy, the average state cost is upper bounded by L. 
We note that this can be captured in a causal coding/decoding 
framework by considering the sequential information gain 
distortion function ( |34] |. From Lemma IVI.IOI we note that 
a sufficient condition for optimality to this control problem is 
for 

. I{Y,;Y''^) = for all i 

» Xi ^ Pxi'Hi Py\Xi L), given in ( fTTT l. for all i 
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Let X = M and denote Fx{-) as the cumulative distribution 
function of the optimal input distribution PxiVi Py\Xj L). The 
Posterior matching (PM) scheme ifTSl simultaneously enables 
the two properties to hold for each i and is given by: 

= F^' {Fw\y.-^{W\Y^-^)) 
F^^ fB,_ii,_i([0,VF])) 



(74a) 

X \^'i-i{i-i\[^-,yy \)) (74b) 
= e(VF,Z,_i) (74c) 

where (|74ct follows from Theorem |V.3| and because Wi — W . 
Note that the F^ji^yf i-i operation constructs a uniform- [0, 1] 
random variable that is independent of the past channel 
outputs, and the F^"^ shaping operation enables each input to 
be drawn according to the optimal channel input distribution 
Pxir],PY\x,L). Note that from (I74cb . the PM scheme can 
be interpreted as 'minimal' from our structural result in the 
causal coding/decoding framework in Section |V] Moreover, 
the causal encoder is time-invariant, and so likewise for the 
decoder acting as the nonlinear filter; thus, this means the 
PM scheme also can be interpreted as an instance of the 
inverse optimal control framework via Lemma IVI.IOI See 
Figure |9] and its relationship with Figure 2] Also note that in 
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Fig. 9. Posterior matching .scheme by Shayevitz & Feder, interpreted as a 
time-invaiiant manifestation of the simplified structural result in Figure [s] 

some cases, the time-reversibility condition for inverse optimal 
control in Section IVI-AI is applicable: from Example[T] the PM 
scheme ( l74l i elicits reversibly feasible dynamics . Consider 
the additive Gaussian noise channel. Under the PM scheme, 
{Xi,Yi : i > 1) are jointly Gaussian (see (I53eb and ifTSl 
Example 1]). Note that since Xi = Xi-i in ( I53db . the joint 
reversibility sufficient condition is equivalent to reversibility of 
the Markov chain X. Since all stationary Gaussian processes 
are time-reversible, we see that in this scenario, the time- 
reversibility framework for our inverse optimal control frame- 
work is linked to the PM scheme. Although our control prob- 
lem only addresses the maximization of mutual information - 
which is a necessary condition for reliable communication by 
the converse to the channel coding theorem - it can be shown 
that reliable communication, as defined in Section II-AI results 
as a consequence of the mutual information maximization 
control problem under mild technical conditions [ITSl . 

C. Structural Result: Brain-Machine Interfaces 

A brain-machine interface (BMI) is a system that elicits 
a direct communication pathway between a human and an 
external device. In many cases, it is the objective of the 
human to control an external device merely by imagination, 
and the external device acquires neural signals, actuates some 



physical system, and perceptual feedback is given to the 
user to complete the loop. We now demonstrate how our 
structural result can be applied to the design of brain-machine 
interfaces that have a 'user-friendly' structure: displaying the 
minimal amount of useful perceptual feedback to the user, 
and designing an interaction strategy between the user and the 
external device. 

Consider a brain-machine interface where a human has a 
desired high-level intent represented by the Markov process 
[Wi : j > 1). At each time step, the human imagines a control 
signal Xi which is statistically linked to neural activity Yi that 
is observed by the external device. For example, the statistics 
of Yi are different when imagining a left-oriented movement 
Xi = as compared to imagining a right-oriented movement 
Xi = 1 1701 . At each time step, the external device maps all its 
recorded observations to actuate some system, whose state 
is given by Zi. Equally as important, the user gets perceptual 
feedback from the external device and allows this, along with 
causal information about the high-level intent, W'\ to specify 
the subsequent imagined control signal Xi. 

Without loss of generality, because we do not know yet what 
perceptual feedback is the most relevant, we could consider a 
scenario where all information available to the decoder at any 
time i is fed back to the subject. Secondly, we may assume that 
we are planning to design the coordination strategy between 
the user and the interface: not only how the interface should 
take its observations and actuate the plant, but also what 
perceptual feedback should be specified back to the user and 
how the user should react to the perceptual feedback to specify 
the subsequent control signal Xi. In such a case, this problem 
boils down to our problem formulation in Section |III] Note 
that because of the causal nature of the problem, real-time 
constraints with a human in the loop obviate the possibility of 
using 'block-coding' like paradigms. Secondly, such settings 
are more complicated than simply optimally representing in- 
tent with an arithmetic coding procedure as in IItTI - because of 
the inherent uncertainty also due to the noisy channel mapping 
intent to neural signals. 

Almost all previous approaches to design BMIs failed to 
consider how the desired control signals change in response 
to sensory feedback. For example, many previous schemes 
simply attempt to recursively estimate Xi from y under the 
assumption that {Xi : i > 1) is a Markov process. However, as 
we know from our structural result, for an arbitrary objective 
with additive cost function, it is crucially important for the 
system to keep a running estimate, or belief, on Wi given F*. 
Moreover, it is critically important that the user and the system 
agree on an interaction protocol that specifies both what 
sensory feedback is provided to the user (e.g. the sufficient 
statistics) and how the user should react to this feedback in 
pursuit of high-level intent (e.g. the function e^). 

Our structural result says that first a state filter can construct 
sufficient statistics Si = {Zi-i, B^i), and then the external 
device can actuate the plant using Si and the user only needs 
to be fed back Si-i as perceptual feedback. This information, 
along with the current high-level goal Zi, is all that is needed 
to specify an optimal causal encoder e^. See Figure [Tol 
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Fig. 10. Structural result within the context of a brain-machine interface: 
in an optimal system, the user acts as part of the causal encoder. The other 
part accumulates all causal observations and summarizes them into sufficient 
statistics acting as perceptual feedback to the user 



In 11531 . we instantiate this idea in an EEG-based BMI in two 
steps. We assume the high-level intent can be mathematically 
represented as a Markov process [Wi = : i > 1) on 
W = [0, 1] for which W is uniformly distributed over the 
[0, 1] line. As such, this means we are assuming that the whole 
high-level intent is known to the user at all times. To relate 
this to a variety of practical applications, the user interprets 
the message point as a countably infinite sequence of symbols 
D = [Di, D2, ■ ■ ■) v[iw ordered countable set V with a known 
statistical model (typically modeled as a fixed-order Markov 
process). Examples of the sequence D include an infinite 
sequence of text characters or an infinite sequence of small 
path arcs pertaining to a smooth path of bounded curvature. 
We use arithmetic coding ||2l to develop a one-to-one mapping 
between any such sequence D and a point W = '''{D) 
uniformly distributed on the [0, 1] line. We subsequently use 
an EEG system and specify a binary-input (left/right motor 
imagery) noisy channel with a spatial filter to extract beliefs 
_Bj|j sequentially llTOi . With this, we implement the Posterior 
Matching scheme for the binary symmetric channel IIT6I . ifTSl . 
Here, what is nice for a human in the loop is that = e, and 
secondly, for the BSC, it only requires a functional of the 
posterior Bi_i\i_i to be given to the encoder at time i: the 
median (denoted as m{Bi_m_i)) lfT6l . ifTsl : 



^ ^ lO, VF < m(B,_i|,_i) 



(75) 



Because of the one-to-one mapping r, at time i, this can 
be implemented by visually displaying the median path 
T~^(m(i?j_i|j_i)) on the screen and instructing the user to 
obey the time-invariant PM scheme dTST l within the context of 
the median path. This simply means performing a lexicograph- 
ically comparison to D (i.e. identify the first location where 
the sequences differ and perform a symbol-based comparison). 
We have successfully implemented this to demonstrate reliable 
text spelling and two-dimensional smooth path specification. 
Secondly, wedding with arithmetic coding with the PM scheme 
has the added benefit that a natural 'propagation' of uncer- 
tainty ensues: the locations where D and {m{Bi_i\i_i)) 
differ increase to later and later parts of their sequences; 
this leads to a natural real-time implementation plausibility. 
Remote-control of an unmanned aerial vehicle using this 
paradigm has recently been shown in ||721 . 

We also comment how the PM scheme by Shayevitz and 
Feder lITSl is particularly relevant here: formulating this prob- 
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lem as one where the encoder has one of 2"^ hypotheses 
would mean that the human agent attempting to elicit neural 
control of an external device would have to implement an a 
strategy that differentiates possible inputs based upon one of 
2nR hypotheses. Even with visualization, this could be cum- 
bersome. Moreover, it is unclear how the design specification 
would change when n ~ 100 as compared to when n = 101. 
Remarkably, using the posterior matching framework makes 
this problem truly solvable both theoretically and practically 
- by simply changing the starting point to be W = [0, 1] and 
7. = V (W) and defining an appropriate information gain cost 
criterion. These observations speak to the fragility at which 
information theoretic problems with the same fundamental 
limits can be formulated. 

The structural result demonstrated in this paper now en- 
ables the opportunity to design many brain-machine interface 
paradigms for a variety of cost functions beyond the the 
information gain paradigm and with assumption that Wi = 
Wi-i. The structural result has the potential more generally 
to enable an interesting intersection of desires on one platform: 

(i) guaranteed optimality from a decision-theoretic viewpoint; 

(ii) elucidation of the minimal amount of perceptual feedback 
information required to optimally display to the user; and (iii) 
potential ease-of-use when (e.g. when = e and it has a 
simple operational interpretation). 



D. Inverse Optimal Control: Gauss-Markov source and 
AGN channel 

Here we show that a stationary Markov coordination strat- 
egy consisting of a linear 'estimation error' encoder and 
MMSE decoder is inverse-control optimal for a Gauss-Markov 
Qw and a power-constrained additive Gaussian channel. A 
variant of this problem for p{'Wi, Zi-i, Zi) = p{wi,Zi) = 



(Wi 



has been studied by 



Let W = X = Y = Z = R. The source is a Gauss-Markov 
process with i.i.d. Wi ^ J\f (O, cr,^„). 



Wo - A/" 



9 2 



W^ = pW^-l +W i>l, 
/(FFi;X'-\y*-i) = 0, i>l 



(76a) 

(76b) 
(76c) 



Note that we are not assuming that FF is stationary. As 
such, this problem can be connected to problems in 'control 
over noisy channels. In such problems with quadratic cost and 
linear Gaussian dynamics, the essence of optimally solving the 
control over noisy channels problem is optimally solving this 
causal coding/decoding 'active tracking' problem 

The channel additive with Gaussian noise (AGN): 



Y, 



(77) 



A typical objective in practice is to design an encoder 
and decoder than can minimize the mean-squared error in 
estimating the source process, i.e., minimize J(e",d") = 



Wi, 



It is known |l39J,||20l that an 
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optimal linear coordination strategy exists, pertaining to "er- 
ror" encoding and MMSE estimation decoding: 

X, = !i,{W,-¥.[W,\Y'-^]) (78a) 
Z, = ¥.[W,\Y'] (78b) 

where f3i are time-varying normalizing constants that result in 
Xi ^ Af{0, L) for all i, and the power-constraint L depends 
on the value of a. 
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Fig. 11. With QiY Gauss-Markov and Py\x ™ AGN channel, "error" 
encoding and MMSE estimation decoding is inverse control optimal. The 
induced cost function is squared en'or. 



We now consider observing this problem from the lens of 
inverse optimal control for a distortion function of the form 

Lemma VII.2. For the problem setup in M6\ , define the 
following Stationary Markov coordination policy 



X, = p{W,-pZ,^^) 



(79a) 
(79b) 



where j3 



§-7 



/LC 



^, and C 



(a) The policy pair in ( |79l l is inverse control optimal 

■q{xi)cc+ x} (80a) 



p{wi,Zi_i,Zi)(x+{wi - z,y 



L + at 



■K -z,„i)2(80b) 



(b) The total cost can be represented as a weighted 
MMSE cost given by J"^: 




= E 



The proof is provided in Appendix |H] 



{Zn - Wnf 



Remark 5. The policy-pair in ( 1781 ) is optimal for a 
mean-square distortion cost (MMSE) problem for Gauss- 
Markov sources except that the last reconstruction has 
higher penalty. For n — >■ oo, the cost for which ( 178b 
is optimal is exactly equivalent to a MMSE cost prob- 
lem lim„_j.oo 

\Y.1=AZ^-W,Y + aX^. Thus, asymp- 
totically, we can recover the results of 4591/ . 0Ql Ch. 6] using 
inverse-optimal control and time-invariant cost functions. 



E. Inverse Optimal Control: the M/M/1 Queue 

Here we show that the -/M/l queue's dynamics can be 
interpreted as a stationary Markov coordination strategy that 
is inverse control optimal for Qw being a Poisson process. It is 
well-known from Burke's theorem ll64l . ||67| that for a Poisson 
process of rate A entering a -/M/l queue, in steady state the 
queue state at time t is independent of the output before time 
t. We now demonstrate that this statement has implications not 
only for the capacity of queuing timing channels ll54l . 1551 . 
Il68l . Il73l . but also for inverse optimal control. 

Divide time into units of interval A where A <C 1. The input 
to the queue Wi represents the number of arrivals to the queue 
till time i. For a Poisson source, [Wi : i > 1) is the discrete- 
time equivalent of the counting function representation of a 
Poisson process. 



AA, if Wi ~ Wi^i + 1 
!w im\wi-i) ^ { I - XA, ifwi^w^-l 
0, otherwise 



(81) 



In other words, Wi = Wi_i + Wt where Wi are i.i.d. with 
P (Wi = 1^ = AA. Assume the following model for the 
channel: 





^A 



X = 
a- > 



(82) 



For a queuing system, note that this means that a departure 
(Yi ~ 1) can only occur when the number of customers in 
the queue is positive, and the likelihood of a departure in 
that scenario for a bin of length A is pA. Continuing on 
with the queuing analogy, note that we represent Z as the 
counting function representation of the departure process as 
Zi ~ J2k<i^k where Yk E {0,1}- Xi is the queue size 
representing the number of customers in the queue at the i-th 
time instant: Xi = Wi — Zi-\. Thus, the update equations for 
the state Xi and output of the queue Zi are Unear stationary 
Markov policies given by 



X, 

z, 



= Z,^i+Y, 



(83a) 
(83b) 



The departure at i-th time instant Yi depends on the state by 
the following discrete memoryless 'Z' channel model: That 
is, there will be no departure if the queue is empty, and 
there will be departure with probability pA if the queue 
is not empty. The initial number of arrivals Wq is drawn 

according to P (Wo = fc) = (l - ^) (^) , ^ > and the 
initial number of departures Zo = 0. Note that the aggregate 
statistical dynamics of P^n^yi/n in Figure [12] are precisely that 
of the discrete-time exponential server timing channel, also 
termed a -/M/l queue of rate p, which is a first-come, first- 
serve queuing system with i.i.d. service times geometrically 
distributed of rate p 164) . As A — >■ 0, this becomes the 
continuous-time -/M/l queue. From standard queuing theory 
it follows that X is a birth-death Markov chain in steady-state 
with distribution 



16 



causal 
encoder 



causal decoder 



+ 



X, 



Z 

channel 



Y, 



accumul. 



delay | <- 





accumul 













z, 



■/Mil 
queue 



Fig. 12. With Qw a Poisson process and Py\x 
queue is inverse control optimal. 
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Fig. 13. Py\x for the -/M/l queue 
sampled at length-A intervals 



Fig. 14. birth-death chain for X 
the M/M/1 queue 



Therefore Lemma IVI.8I holds and note that the fixed coordi- 
nation strategy given by (|83T l is inverse-control optimal for 



p{'Wi, Zi-i, Zi) ■ 



lim p{wi,Zi_i,Zi) ~ 

A— i-O 



-log(l-AA), .T,;=0,y, = 0; 

a;i > 0,2/i = 1; 
otherwise. 



log^ 



0, 



= 

Xi > 0, Ui = 1 
otherwise 



(85) 



Figure [T2] is akin to [54, Fig 4], where it is shown that this 
insight and (l85T l leads to the derivation of the capacity of the 
exponential server timing channel. 

Remark 6. Though the ESTC is time-varying, non- 
memoryless, and has non-linear dynamics from a inter-arrival 
time viewpoint / I54I/ . when viewed appropriately, its internal 
structure consists of a time-invariant memoryless 'Z' channel 
and a feedback loop comprising a linear SM coordination 
strategy 7f. Moreover, for a Poisson process input, tt is in- 
verse control optimal. As such, the internal structure of the 
■/M/l queue can be interpreted as an optimal decentralized 
controller Also, note how the internal structure is exactly 
synonymous to the Gaussian case (I76l l f 4.?9l/ . ^20l/ ) in that the 
encoder and decoder are both linear dynamical systems. 

The result differs from the source-channel matching results 
in P9l Sec 3] for two reasons: i) the problem is approached 
through an inter-arrival viewpoint in P9l . while we use 
counting function representation (inputs and outputs to the 
queue), ii) The dynamics of the ESTC are fixed and P9l 
considers a possible encoder between the poisson process and 
the ESTC input, and a decoder between ESTC output and 
the reconstruction and show that the encoder and decoder 
should be identity mappings. In our case, the linear encoder 
and decoder policies are fixed and internal to the structure of 
the queue dynamics. As a consequence, the source-channel 



matching results has to be performed over a less complicated 
memoryless 'Z' channel. 

Other extensions to queuing timing channels fit within this 
framework as well: see for example the variety of queuing 
systems in ll69l for which joint reversibility holds. Similar 
results hold for other queuing timing channels, such as: 

• ■ /M/c queue: There are c servers each with an i.i.d expo- 
nential service time. In this case, the queue dynamics-the 
linear encoder and the decoder will be the same (FigfTZt. 
The structure of memoryless channel {Py\x) will depend 
on c. 

. 'The queue with feedback' JP] p 204-205]. Here, with 
probability 1 — po departures from the queue instanta- 
neously return to the input of the queue (independent of 
all other processes). The 'effective' Z channel changes 
pCS. to Pop A and all other arguments hold. 



F. Inverse Optimal Control: Blackwell's Trapdoor Chan- 
nel 

Here we show that the internal structure of Blackwell's 
trapdoor channel can be interpreted as a stationary Markov 
coordination strategy that is inverse control optimal. 

Consider 'the chemical (trapdoor) channel' ll56l . Il57l . Il58]| 
as shown in Fig [15] Initially (Fig. [TSh). a ball labeled either 
(red) or 1 (blue) is present in one of the two slots. Then (Fig. 
[TSb ) a ball, either a or 1, is placed in the empty slot, after 
which (Fig. [TSb ) one of the trapdoors opens at random with 
probability (5,5). The ball lying above the open door then 
falls through. The door closes (as in Fig. [T5h ) and the process 
is repeated. 



® 




Fig. 15. Blaclfweirs Trapdoor Channel 

Let Wi e {0,1} and Yi e {0,1} represent the color of 
the ball that is input and output of the trapdoor respectively. 
Define the channel input Xi to pertain to the composition of 
balls before one of the doors is opened (Fig [T5b). That is, 
Xi e {0, 1,2} where Xi = represents two red balls (0,0), 
Xi ^ 1 represents a blue ball and a red ball (0,l)and Xi ~ 2 
represent two blue balls (1,1). Thus, the dynamics are given 
by 



X,, 



X,, 



(86) 



From a counting function viewpoint, let {Wi} and {Zi} be the 
counting processes representing the number of blue balls that 
were input and output from the system. Hence X^, as defined 
above tells about the composition of balls, or equivalently the 
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number of blue balls that are 'in' the system at time i. 



is optimal for the cost function of the form 



Zi = Zi^i + Yi 



(87a) 
(87b) 

(87c) 



Note that the state-update equation and the decoding policy 
( |87b| )-( [87cl i are reversibly feasible dynamics by Example |2] 
The output depends on the state according to the channel law 
Py\x^\X-) as (the inverse erasure channel) as shown in figure 

El 



(88) 




'"cdusaV" 
encoder 



causal decoder 



+ 



Py 



accumul. 



delay | <- 



accumul 



trapdoor 
channel 



Fig. 16. With Qt^ a Markov counting process (i.i.d. Wi inputs) and an 
'inverted E' channel, Blackwell's trapdoor channel is inverse control optimal. 



1(1 -P) 





Fig. 18. birth-death chain 
for X in the trapdoor channel 
with Qi4' ^ Mai'kov counting 
process. 



Fig. 17. Py\x for trapdoor chan- 
nel 



Fixing W to be an i.i.d process, with P yWi = OJ = p., 
and Zq = 0. The transition probabilities of the Markov Chain 
Xi are given by Fig [18] 



P = 



P 

\v 





1 

2 

V 







and if is drawn according to 



P (Wo = fc) 



k=0; 
2p(l-p), k=l; 
{\-v)\ k=2; 
0, otherwise. 



it follows that we have a birth-death chain initially in 
steady-state with distribution 7r(-) = P(Wo = •)■ Thus from 
Lemma IVI.8I we have that tt is inverse-control optimal. 
Moreover, the from Corollary IVI.6J the trapdoor policy 



logp, 
log 2p, 
log 2(1 -p), 
log(l -p), 

+ 00, 



0,2/, =0; 
=0; 

= 1; 

. 2,2/, = 1; 
otherwise. 

' -/(7r,Py|jcj 



Xi 
Xi 
Xi 



Note that when p = i, E [p{W, Z,^i,Z,)] 
— ^, which coincides with the achievable rate coding scheme 
developed for the trapdoor channel in ll57l . 



VIII. Discussion and Conclusion 

In this paper, we have developed a new class of causal 
coding/decoding problems that can be understood from the 
lens of both information theory as well as control theory. 
We would like to emphasize that the primary focus of this 
paper is not about fundamental limits (although some new 
fundamental limits are presented). Rather, it is about attempt- 
ing to develop a modeling framework whereby principles of 
traditional information theory (e.g. KL divergence, mutual in- 
formation bounds, etc) and traditional concepts (e.g. dynamic 
programming, structural results), can be wed to elucidate 
things and impact the design of future real-world systems and 
appUcations. 

A second aim is to demonstrate that by first formulating 
problems in this manner (whereby the notion of rate is 
not necessarily embedded directly in the problem formula- 
tion), fundamental information theoretic limits fall out as a 
consequence of solving the problem. For example, in our 
information gain cost, we demonstrate how the posterior 
matching scheme by Shayevitz & Feder is an optimal solution 
to a causal coding/decoding problem. We did not attempt to 
directly impose the notion of achievability in the cost function, 
but rather constructed a cost function from the converse. It 
was shown in ifTSll that in essence achievability still holds. 
Moving forward, this suggests that perhaps equally as much 
attention should paid to the manner in which problems are 
formulated as what is being paid to attempt to solve already- 
formulated and un-solved problems. The authors believe that 
a significant amount of practical and theoretical advances - 
including and extending beyond communication - can be made 
if the information theory community embraces this challenge. 

In light of how the second law of thermo-dynamics appeared 
in Section [V] and how Markov chain time-reversibility ap- 
peared in Section [VI-AI perhaps further work could be pursued 
to further understand the relationship between information 
theory and thermodynamics. It has recently been suggested 
that such an understanding could additionally play a role 
in understanding brain function ||741 . ||751 . Recent develop- 
ments in the neuroscience community have begun to posit 
that Bayesian decision-making could implicitly be playing a 
role in the processes of sensation, perception, and decision- 
making in the mammalian brain through interacting neural 
systems sequentially handing to one another 'what is missing' 
Il76lll77lll78l . ll79l . The sequential information gain framework 
perhaps could provide insight into further considering these 
matters. Analogously, the inverse optimal control framework 
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developed here, when combined with statistically inferring the 
coordination poHcies as in ||43], EH, ES], ED, IS, Q, H, 
could provide insight into what cost is being minimized. 
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Appendix A 
Proof of Lemma UvTI 

Proof: As described in the statement of the lemma, define 
the state space S — 7. y.V (W) and control space U = E x Z 
with Si S S,Ui g U given by ( l27T i: 

St ^ izt-i,bi\i), u = {ei+i,Zi). 



Then 



p{wi,Zi-i,Zi)hi\i{dwi) (89) 



P{Si,Zi) 

E = z,_i,r' = y\Et+i = e,+i 



m-j (e,+i {wt+i)) ^{bi\i){dwt+i) (90) 

lUi+iew 

where (|89]l follows from ([Till; & follows from (E). ■ 



Appendix B 
Proof of Lemma [IV!2] 



Proof: Note that 



chain with time-invariant statistical dynamics. 

Appendix C 
Proof of Lemma IvTI 

Proof: 

n 

i=l 

n 



(92) 



(93) 



n 

Y,IiW';Y,\Y^'') 
1=1 

+I{Wr+i;Yt\Y'~\W\Xi) (94) 

n 

J2HW;Yt\Y'-^) (95) 



1=1 



(96) 



{s.+i,i=«.,2} / / ^{fc.+i|,+i=A((,,|,,y,+i,e.+i)} 



^V|x (dyi+i|ei+i(u'i+i))6i+i|j(du;j+i) (91a) 

-'-{Si+l,l=Mi,2} / / -'-{Si+l,2=A(Si,2,yi + l.«i,l)} 

Py\x (rfyi+i|wi,i(^«+i)) $(s.i,2)(rfu'j+l) (91b) 
^S', + i|Si=s,,C/i=u, (dSi+i) 

Qs((isj+i|sj,Ui) (91c) 



i=i 

+I{W'-^-Y,\W,,Y'-^) 

n 

= Y,HWt;Yt\Y^-') 

i=l 

+I{W'-^;Yt\Wt,Y'-\Xt) (97) 

n 

= ^/(W^,;y,|r'-i) (98) 

•i=l 

where ^ follows from ([TO); ([93) follows from from ((TO); 
dUli follows from ([Bli; (|95]l follows follows from ([17); dllll 
follows from ([TOl i; (i97] i follows from our assumption (|32a) that 
the encoder operates on sufficient statistics; and (i98] l follows 
from ([T7]i. ■ 

Appendix D 
Proof of Lemma [v!2] 

Proof: ( [35] ) follows directly from Lemma II V. 1 1 Now, let 
us focus on ( l36l ). From Lemma II V. 1 1 we have that 

p(s,z) = / p{w,b,z)b'{dw) 
Jwe\N 

-log -^{w)b'{dw) (99) 

where (i99] l follows from (i34] i for any z satisfying z ^ $(6) 
and is infinite otherwise. Now note that if it is not the case 
that b' <Si z, then there exists a set A S S (W) for which 
z{A) = and b'{A) > and thus it follows that = 
=^ — log -j^j^{w) — oo for all w E A. Thus p{s, z) ~ oo. 
Now assume 6' ^ z <C $(6). Then since if (3 i/ p then 
1^ = 1^^, /x-almost everywhere (SO] Sec 5.5], it follows 
that 

p{s,z) = [ - log iw)b' (dw) + log ^{w)b'idw) 
Jwew d9{b) dz 



where (19 lab follows from ([TtI i and ([T4l l; (I91bt follows from 
( |27] |; and ( 19 let demonstrates that this is a controlled Markov 



-D(6'||$(6))+-D(6'||z) 



(100) 



Appendix E 
Proof of Theorem |V.3| 
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Proof: In order to find the optimal cost J* given by J* = E [Vb('S'o)], we use the standard dynamic programming approach 
and evaluate optimal cost-to-go functions {Vj, : fc = 0, ■ • • Consider the final-stage problem of finding l^(s„), where 
Sn — {zn-i,bn\n) ^^d describe any control w„ as m„ = (e„+i, z„). Then the one-stage problem is 



Vn{Sn) = inf gn{Sn,Un) 

■u,i = (e„ + i,2„) 

= inf p(s„,z„) 
z„e-p(w) 

= -Z?(6„|„||$(z„_i)) + 
= --D(6„l„||$(z„_i)) 



inf D(h 

z„eV(W), b„|„<z„<*(z„-i) 



'n|n II J 



(101) 
(102) 
(103) 



where (llOll i follows (|30] |; (1102b follows from (l36T i; and (I103l l follows from the non-negativity of the KL divergence. The 
optimal choice of z„ is the one for which the equality in ( 1103b holds true and hence under an optimal policy, Zn = bn\n- 
This follows the same reasoning that elicits how for in the self-information loss sequential probability assignment, the best 
probability assignment is the true belief [281. 

For the second-step k = n — I, consider finding Kn_i(s„_i), where s„_i = {zn-2, &Ti-i|n-i) ™d describe any control Un-i 
as Un-i = (e„,z„_i). Then we have; 



14-1 = inf gn-l{Sn-l,Un-l) +E[Vn {Zn-l,B„]n) \Sn-l = Sn~l,Un~-l = Un-l] 

«n-l = (e„ .Zn-l) 



^ inf Q;ry(s„_i,e„) + /o(s„_i,z„-i) +E (S„|„||$(z„_i)) \Sn-i = .s„_i,C/„ 
" ^ 1 

'n-l|n-lll*(z„-2)) +i 

-D (B„|„||$(z„_i)) |5'„_i = Sn-i,En = e. 



(104) 

^n-l] (105) 



-D(fo„-i|„-i||$(z„-2)) +infa77(s„_i,e„) 4- inf £> (&„_i|„_i ||z„_i) 

e b„_i|„_i«2„_i«;*(z„„2) 



(106) 



where (1 105b follows by substituting values of gn-i and Vn from (|30] l and (|103b ; (1106b follows from (|36T l. 
For any fixed encoder policy e„, the optimal choice for Zn-i is to pick = as shown; 



4_i(sn-i) = arginf (&„_i|„_i ||z„_i 

fcn-l|„-l<Zn-l<S'I'(z„-2) 

= arg inf 



D (B„I„||<I>(z„_i)) \Sn-i = s„-i,£'„ = e„ 
D (Pa (•|6„_i|„_i,e„) ||Pa (•|z„_i,e„)) 



+ D {bn-i\n-i\\zn-i) - E D (A (fe„_ i|„_ i , F„ , e„) ||A(z„_i,y„,e„)) \Sn-i = s„_i,£'„ = e„ 



>0 



Dri-l|n-l- 



(107) 

(108) 
(109) 
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where (IIO8I 1 and the non-negativity of the difference follow because 

4_l(Sn-l) = 



arginf D (5„_i|„_i ||z„„i) -E (B„|„||$(z„_i)) |5„_i =s„_i,i;„ = e„ 

,i-i<z„-i<*(2,i-2) '- 

arginf D (6„_i|„_i || 



- E 



L'(B„|„||$(6„_i|„_i)) -Eb„|„ 



log 



C^«'(&n-l|n-l) 



d$(z„_i) 



B 



n|n 



n— l|n— 1 



"ra-l|Ti-l 



arginf D (6„_i|„_i || 

-l<2„-l<*(2,i-2) 

L'r"^'" (^") A(&„_i|„_i, y, en){dw)PA (dy|6„-i|„-i, e„) 



arginf D (6„_i|„„i |I 

Z71— 1 ) 

f'„-l|„-l«:2„-l<S*(2,i-2) 



(110) 



(111) 



lOf 



j/GY JtoSW 



/ dA(6„_i|„_i, j/,e„) 
\ dA(0„_i, y, e„) 



^n— l|n— li 



lOf 



•|On-l|n-l) e„j 



'^n — l|n — 1 1 ) 



y^Y cJPa (-IZn-l, e„) 

arginf (Pa (-l&n-iin-i, e„) ||Pa(- 



n)) 



&r.-l|,.-l<2n-l<*(z,>-2) 

D {bn-i\n-i\\zn-i) ^ (. | b„_i|„_ ^ ,g„ ) ( A i|„_ i , F„ , e„) ||A(z„_i,y„,e„))] 



(112) 



(113) 



>o 



where (TM follows because B„|„ < $(fe„-i|„-i) < *(zri-i) and so ^^(^"'"^^ = rfj.(b„_"'"„_,) i$'(^„'i") ' ED follows 
from the definition of the nonlinear filter (I23bt : (I112t follows from the fact that ^ ^^n-i and the definition of the 

nonlinear filter in (l20] i: and the difference in (1113) being non-negative follows from mapping this scenario to that of the hidden 
Markov model and the nonlinear filter: 

• Here, the latent Markov process is W and one observation y„ is recorded. 

• Because in this dynamic programming problem, while in state s„_i and under a fixed e„ : W — X, the noisy channel 
from Wn to Yn is the composition of the encoder map e„ and the input to the channel from Xn to Yn- Py„\w,^ {dy\wn) — 

PY\x{dy\en{Wn)). 

• Two different decoders both know the statistical dynamics but have different initial beliefs about Wn-i- One decoder's 
initial belief is £ V (W) and the other's is 2„_i G V iy^)- The initial 'distance' between the beliefs is measured 
by the KL divergence, D (fe„-i|„-i ||z„^i). 

• Both decoders observe Yn and update their beliefs about Wn according to the one-step nonlinear filter one updates its 
belief according to A(6„„i|„_i, y„, e„) and the other does so according to A(0„_i, F„, e„). The divergence between their 
beliefs after the observation is given by D (A (6„_i|„_i, Kn, e„) || A (z„_i, y„, e„)) and on average this is smaller than 
the original due to Jensen's inequality and the second law of thermodynamics for hidden Markov chains. This inequality 
is thus a manifestation of how the relative entropy is a 'Lyapunov function' for the stability (e.g. insensitivity to initial 
beliefs) of the nonlinear filter ||50] Remark 4.2]. 

Hence the optimal choice for Zn-i is to pick &,i_i|„_i. Consequently, 



14,-1 (s„-i) = -L'(&„_i|„_i||<I>(2;„-2)) +infaf?(s„-i,e„) +E 

= -D {hn-l\n-l\\'^{Zn-2)) + a^(s„-l , 6* [6„_i|„_i 



(&n-l|n-l: B, 



IB. 



-l|n-l 



-1; En 



E 



^71 — e.*n[bn-l\n- 



(114) 



Using an inductive argument and the exact same set of arguments as above, it follows that for any 1 < fc < n — 1, and any 
encoder policy et+i, the optimal choice for Zk is given by and that for = (zfc-i, bk\k). 



Vu{sk) = -D{bk\kmzk-i))+af]{sk,~el+i[bk\k]) 



E 



Vk+i ((6fc|fc, Bfe+i|fe+i)) \Bk\k — fefe|fe,-Efc+i 



(115) 



For the initial step, fc = 0, by definition Zq{A) ~ Bf)^Q{A)F {Wq G A) and is known to both encoder and decoder Thus the 
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minimization is only over ei. For a state sg = 6o|o) ™d control uq = (ei, zq) ~ (ei, &o|o)' we have: 
Vo(so) = inf Q;^(so,ei) +E Fi (6o|o,Bi|i) |Bo|o = ^oioi^'i = ei 

ei L J 

= afi{so,el[bQio]) +E X^i (&o|o, ^i|i) |So|o = ^o|o, £'1 = [&o|o] 
Next, from (l33]l we have that I{Wi;Y,\Y'-^) = E[D (Bi|,||<I>(Si_i|i_i))] and thus from (EB we have: 



JZ^, = E [Vo{So)] = min -I{W"; F") + aE 



:;eE 



(116) 
(117) 

(118) 



Lastly, it follows directly that a more 'concise' sufficient statistic exists for the encoder - namely that it does not need to 
maintain Zi-i to produce Xi+i because under any optimal scheme, = Bi_i\i_i and thus a{Z.i-i) C a{B^i) so the state 
variable Si = {Zi^i, B^^) can be reduced to Si = {B^i) with Lemma |IV.2| still holding. ■ 



Appendix F 
Proof of Lemma [VL2] 



Appendix G 
Proof of Lemma IvOl 



Proof: Note the following standard set of inequalities: 



n 



(119a) 
(119b) 

(119c) 



1 

n ^-^ 

1=1 

1 " 

= "X]^ i^yi\Wi,Yi-'^\\PYi\Y'-APWi,Y--'<-) 
i=\ 

1 " 
1 " 

= -Y^l{X,,Yi) 
1 " 

< -Ec(^,^y|x,E[r7(X0]) (119f) 

1=1 

< C(r,,Pi.|x,L) (119g) 



where (I119ab follows ( l43] i: ( I119bl i follows from the data 
processing inequality; ( 11 19ct follows from Lemma lyTl ( 11 19dt 
follows from the definition of conditional mutual information 
(|9]l and the fact that Xi is a function of Wi and under 
policy e; dl 19e| ) follows from the memoryless nature of the 
channel ( [TT] ) and Jensen's inequality; dl 19fl ) follows from 
( fT2] i; and ( |119g 1 follows from (l42T i and the concavity of the 
capacity-cost function 



Proof: To prove ( 145 a| 



= P. 



,W'^=w'^{dzi) 

-'^ ,W"'=w* ,Yi=y{dZi) 



'^PYi\Z--^=zi-\W"=w"{dy) 



p 



Zi\Zi 



.i-^ ,W" =w'> ,Yi=y{dZi) 

x^V,|Z'-i=z'-Mv-=ju",Xi=g(«.i,z,_i)(c^y)(120) 

'^{z^=d(.^^,^y,)}PY\X=-e(w,,z,^,){dy) (121) 



- Qz'\z,w'{dzi\zi^i,Wi) 



(122) 



where (1120b follows from the stationary Markov encoder 
policy: Xi = e{wi^Zi-i)\ ( 1121b follows from defining 
^{z =d(z 1 J/ )} ^ Dirac measure at the point d{zi^i,yi), 
the stationary Markov decoder policy Zi — d{zi-i,y), and 
the non-anticipative and memoryless nature of the channel 
([TtI i; and (1122b simply denotes the time-invariant nature of 
the conditional distribution; 

To prove (I45bb . we exploit the assumption that {Yi} are 
i.i.d. Because of this, we can denote {Zi : i = 1, . . . ,n) by 
the following composition of independent random maps: 

Zi = d{Z,^i,Yi) = dy, (Zi-i) = dvi o dY,_^ o ■ ■ ■ o dy^ (Zi) 



This is thus an an iterated function system (IFS) ||62| , which 
is a time-homogeneous Markov chain over the state space Z. 



Appendix H 
Proof of Lemma [VII.2I 

Proof: Let E^ = Wi-E [Wi\Y''-^] be the error term in 
estimation. We now select the statistics of Wq such that Xi ^ 
Af{0, L),\/i. The normalizing coefficient can be expressed as 
f^i ^ \J Cov{E- g"7' where the covariance of the error term can 
be recursively computed using 



2 2 

CowiE,,E,) = I fq^Cov(i?,„i,£;,_i; 
1 Cov(Wo,W^o), 



- (123) 

i ~ 
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Let the steady state value of the covariance from (1123) be 
denoted by C. Then, 



have 



C ^ 



1 n2 



(124) 



Note that because of the choice of Wo in ( I76bl i. Cov{Ei,Ei) = 

C and /3, = /3 = for all i > 0. 

Since all operations are linear and all primitive random 
variables {Wi, : i > 1) are i.i.d. and Gaussian, and since all 
other relationships are linear, all random variables are jointly 
Gaussian. From standard MMSE estimation theory, Ei thus 
independent of As such, clearly = 0. 

Since the initial condition Wq is chosen according to (I76al l. 
X, - A/'(0,i) for all i. Therefore, since the variance of T^'s 
is tr^, this means that Y's are i.i.d. The policies dTSl l are thus 
stationary-Markov coordination strategies: 



(125a) 
(125b) 



where ( fT25al i follows because E 



E 



pW,-i+W^\Y' 



= pZi^i, and ( I125bl ) follows 

by expanding E using the innovation sequence and 

exploiting how are i.i.d. The value of the parameters /?, 7 
are given by 



/3 



(126) 



Note that from the definition of C in (I124l l. Pwi\Zi-i=zi-i ^ 
M{pzi-i,C). Hence, using ( |79] |. 

(3z'|z,iv'(-|2t-i,?i't) ~ N{pzi^i+ I3^{wi- pzi^i),^'^(jl) 
Qz'\z{-\z^-i) - A/'(pz,_i,72(i + a^)) 



From Theorem IVI.5I the linear stationary Markov coordi- 
nation strategy ( fTST l is inverse control optimal for a p of the 
form 

, . , dQz'\z,W'{-\z,-i,w,) 

P{W,,Z,^1,Z,) 0C+ -log — (Zi) 

dQz'\z[-\zt-i) 

_ jzi - pZj^l - P"f{Wi - pzt^i)) 



_ (zj-pzi-i) L + a^ 

«+ {zi - Wi)'^ - {w.i - pzj_i)^(127) 

L + (Tf, 



where (|127| i follows from (|126l l. Similarly, the power- 
hke cost for inverse control optimality is given r]{x) oc+ 
D {Py\x=x\\Py) = DiPv{--x)\\PY{-)) CX+ x\ Thus we 



E 



0C+ E 



E 



Y,PiW^,Z,^i,Z,) 

n 

.2^1 

n 



(128) 



(129) 



0C+ E 




where ( fT28] l follows from (gOb); (fT29] l follows from (l76b^ : 
( fT30t follows from ( l76c] i. ■ 
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